AITopics | entropy number

Collaborating Authors

entropy number

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Reliable Estimation of KLDivergence using a Discriminator in Reproducing Kernel Hilbert Space Supplementary Material

Neural Information Processing SystemsApr-25-2026, 23:05:59 GMT

Organization: This supplementary material is presented in a format parallel to the main paper. The section numbers and titles are consistent with the main paper. But, here we also add one new section: Section 10 where we describe the societal impacts and possible negative impacts of the paper. Similarly, the Theorem numbers are consistent with the main paper, but we also have several additional theorems and lemmas which were not included in the main paper. GAN-type Objective for KLEstimation Let f be a discriminator, f: X IR. Let p(x) and q(x) be two probability density functions defined over the space X.

artificial intelligence, dim, machine learning, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Reviews: Fast learning rates with heavy-tailed losses

Neural Information Processing SystemsJan-20-2025, 12:06:15 GMT

This paper provides some new results in an important area which is receiving more and more attention: fast rates when loss functions are unbounded and heavy-tailed. Existing results based on empirical process theory often rely on bounded or sub-Gaussian loss, and the heavy tails (hence non-sub-Gaussian) case is considerably harder. The results presented seem sound and are definitely novel. They rely on results of Sara van de Geer and collaborators on concentration inequalities for unbounded empirical processes. The material is very technical and I would suggest moving even some more material to the appendix.

audibert, bernstein condition, heavy-tailed loss, (9 more...)

Neural Information Processing Systems

Genre: Research Report (0.36)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.68)

Add feedback

Neural networks: deep, shallow, or in between?

Petrova, Guergana, Wojtaszczyk, Przemyslaw

arXiv.org Machine LearningOct-11-2023

The fascinating new developments in the area of Artificial Intelligence (AI) and other important applications of neural networks prompt the need for a theoretical mathematical study of their potential to reliably approximate complicated objects. Various network architectures have been used in different applications with substantial success rates without significant theoretical backing of the choices made. Thus, a natural question to ask is whether and how the architecture chosen affects the approximation power of the outputs of the resulting neural network. In this paper, we attempt to clarify how the width and the depth of a feed-forward neural network affect its worst performance. More precisely, we provide estimates from below for the error of approximation of a compact subset K X of a Banach space X by the outputs of feedforward neural networks (NNs) with width W, depth l, bound w(W,l) on their parameters, and Lipschitz activation functions. Note that the ReLU function is included in our investigation since it is a Lipschitz function with a Lipschitz constant L = 1. To prove our results, we assume that we know lower bounds on the entropy numbers of the compact sets K that we approximate by the outputs of feed-forward NNs.

approximation, artificial intelligence, machine learning, (17 more...)

arXiv.org Machine Learning

2310.0719

Country:

North America > United States > Texas > Brazos County > College Station (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Poland (0.04)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)

Add feedback

Limitations on approximation by deep and shallow neural networks

Petrova, Guergana, Wojtaszczyk, Przemysław

arXiv.org Artificial IntelligenceNov-30-2022

Since neural network approximation is the method of choice in building numerical algorithms in many application areas, it is important to understand not only how well they approximate but also any lower bounds on their approximation power. In this paper, we study the limitations of deep and shallow neural networks to approximate a compact subset K X of a Banach space X when it is required that the parameters in the approximation procedure have certain bounds. This is done by proving appropriate Carl's type inequalities that relate the error of neural network approximation of K to the entropy numbers of this set. We consider feed-forward neural networks (NN) with ReLU or Lipschitz sigmoidal activation functions, width W 2 and depth n, whose parameters have absolute values bounded by a given function w ( n). We prove that the capabilities of these networks to approximate any compact subset K is limited by the behavior of its entropy numbers.

artificial intelligence, log 2, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2212.02223

Country:

North America > United States > Texas > Brazos County > College Station (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Poland (0.04)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

$L^p$ sampling numbers for the Fourier-analytic Barron space

Voigtlaender, Felix

arXiv.org Artificial IntelligenceAug-16-2022

In this paper, we consider Barron functions $f : [0,1]^d \to \mathbb{R}$ of smoothness $\sigma > 0$, which are functions that can be written as \[ f(x) = \int_{\mathbb{R}^d} F(\xi) \, e^{2 \pi i \langle x, \xi \rangle} \, d \xi \quad \text{with} \quad \int_{\mathbb{R}^d} |F(\xi)| \cdot (1 + |\xi|)^{\sigma} \, d \xi < \infty. \] For $\sigma = 1$, these functions play a prominent role in machine learning, since they can be efficiently approximated by (shallow) neural networks without suffering from the curse of dimensionality. For these functions, we study the following question: Given $m$ point samples $f(x_1),\dots,f(x_m)$ of an unknown Barron function $f : [0,1]^d \to \mathbb{R}$ of smoothness $\sigma$, how well can $f$ be recovered from these samples, for an optimal choice of the sampling points and the reconstruction procedure? Denoting the optimal reconstruction error measured in $L^p$ by $s_m (\sigma; L^p)$, we show that \[ m^{- \frac{1}{\max \{ p,2 \}} - \frac{\sigma}{d}} \lesssim s_m(\sigma;L^p) \lesssim (\ln (e + m))^{\alpha(\sigma,d) / p} \cdot m^{- \frac{1}{\max \{ p,2 \}} - \frac{\sigma}{d}} , \] where the implied constants only depend on $\sigma$ and $d$ and where $\alpha(\sigma,d)$ stays bounded as $d \to \infty$.

algorithm, barron space, equation, (15 more...)

arXiv.org Artificial Intelligence

2208.07605

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Ingolstadt (0.04)

Genre: Research Report (0.63)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Optimal learning of high-dimensional classification problems using deep neural networks

Petersen, Philipp, Voigtlaender, Felix

arXiv.org Machine LearningDec-24-2021

We study the problem of learning classification functions from noiseless training samples, under the assumption that the decision boundary is of a certain regularity. We establish universal lower bounds for this estimation problem, for general classes of continuous decision boundaries. For the class of locally Barron-regular decision boundaries, we find that the optimal estimation rates are essentially independent of the underlying dimension and can be realized by empirical risk minimization methods over a suitable class of deep neural networks. These results are based on novel estimates of the $L^1$ and $L^\infty$ entropies of the class of Barron-regular functions.

boundary, decision boundary, neural network, (16 more...)

arXiv.org Machine Learning

2112.12555

Country:

North America > United States > New York (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Switzerland > Basel-City > Basel (0.04)
(5 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Optimal Approximation Rates and Metric Entropy of ReLU$^k$ and Cosine Networks

Siegel, Jonathan W., Xu, Jinchao

arXiv.org Machine LearningFeb-9-2021

This article addresses several fundamental issues associated with the approximation theory of neural networks, including the characterization of approximation spaces, the determination of the metric entropy of these spaces, and approximation rates of neural networks. For any activation function $\sigma$, we show that the largest Banach space of functions which can be efficiently approximated by the corresponding shallow neural networks is the space whose norm is given by the gauge of the closed convex hull of the set $\{\pm\sigma(\omega\cdot x + b)\}$. We characterize this space for the ReLU$^k$ and cosine activation functions and, in particular, show that the resulting gauge space is equivalent to the spectral Barron space if $\sigma=\cos$ and is equivalent to the Barron space when $\sigma={\rm ReLU}$. Our main result establishes the precise asymptotics of the $L^2$-metric entropy of the unit ball of these guage spaces and, as a consequence, the optimal approximation rates for shallow ReLU$^k$ networks. The sharpest previous results hold only in the special case that $k=0$ and $d=2$, where the metric entropy has been determined up to logarithmic factors. When $k > 0$ or $d > 2$, there is a significant gap between the previous best upper and lower bounds. We close all of these gaps and determine the precise asymptotics of the metric entropy for all $k \geq 0$ and $d\geq 2$, including removing the logarithmic factors previously mentioned. Finally, we use these results to quantify how much is lost by Barron's spectral condition relative to the convex hull of $\{\pm\sigma(\omega\cdot x + b)\}$ when $\sigma={\rm ReLU}^k$.

approximation rate, neural network, polynomial, (14 more...)

arXiv.org Machine Learning

2101.12365

Country:

North America > United States > Pennsylvania > Centre County > University Park (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Austria (0.04)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Learning Rates for Kernel-Based Expectile Regression

Farooq, Muhammad, Steinwart, Ingo

arXiv.org Machine LearningFeb-27-2017

Conditional expectiles are becoming an increasingly important tool in finance as well as in other areas of applications. We analyse a support vector machine type approach for estimating conditional expectiles and establish learning rates that are minimax optimal modulo a logarithmic factor if Gaussian RBF kernels are used and the desired expectile is smooth in a Besov sense. As a special case, our learning rates improve the best known rates for kernel-based least squares regression in this scenario. Key ingredients of our statistical analysis are a general calibration inequality for the asymmetric least squares loss, a corresponding variance bound as well as an improved entropy number bound for Gaussian RBF kernels.

artificial intelligence, expectile, machine learning, (16 more...)

arXiv.org Machine Learning

1702.07552

Country: Europe > Germany (0.14)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.54)

Add feedback

The Entropy Regularization Information Criterion

Smola, Alex J., Shawe-Taylor, John, Schölkopf, Bernhard, Williamson, Robert C.

Neural Information Processing SystemsDec-31-2000

Effective methods of capacity control via uniform convergence bounds for function expansions have been largely limited to Support Vector machines, where good bounds are obtainable by the entropy number approach. We extend these methods to systems with expansions in terms of arbitrary (parametrized) basis functions and a wide range of regularization methods covering the whole range of general linear additive models. This is achieved by a data dependent analysis of the eigenvalues of the corresponding design matrix.

basis function, entropy number, operator, (14 more...)

Neural Information Processing Systems

Country: